Multi-Armed Bandits on Unit Interval Graphs

نویسندگان

  • Xiao Xu
  • Sattar Vakili
  • Qing Zhao
  • Ananthram Swami
چکیده

An online learning problem with side information on the similarity and dissimilarity across different actions is considered. The problem is formulated as a stochastic multiarmed bandit problem with a graph-structured learning space. Each node in the graph represents an arm in the bandit problem and an edge between two nodes represents closeness in their mean rewards. It is shown that the resulting graph is a unit interval graph. A hierarchical learning policy is developed that offers sublinear scaling of regret with the size of the learning space by fully exploiting the side information through an offline reduction of the learning space and online aggregation of reward observations from similar arms. The order optimality of the proposed policy in terms of both the size of the learning space and the length of the time horizon is established through a matching lower bound on regret. It is further shown that when the mean rewards are bounded, complete learning with bounded regret over an infinite time horizon can be achieved. An extension to the case with only partial information on arm similarity and dissimilarity is also discussed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Top-k Selection in Multi-Armed Bandits and Hidden Bipartite Graphs

This paper discusses how to efficiently choose from n unknown distributions the k ones whose means are the greatest by a certain metric, up to a small relative error. We study the topic under two standard settings—multi-armed bandits and hidden bipartite graphs—which differ in the nature of the input distributions. In the former setting, each distribution can be sampled (in the i.i.d. manner) a...

متن کامل

Active Search and Bandits on Graphs using Sigma-Optimality

Many modern information access problems involve highly complex patterns that cannot be handled by traditional keyword based search. Active Search is an emerging paradigm that helps users quickly find relevant information by efficiently collecting and learning from user feedback. We consider active search on graphs, where the nodes represent the set of instances users want to search over and the...

متن کامل

Unimodal Bandits

We consider multiarmed bandit problems where the expected reward is unimodal over partially ordered arms. In particular, the arms may belong to a continuous interval or correspond to vertices in a graph, where the graph structure represents similarity in rewards. The unimodality assumption has an important advantage: we can determine if a given arm is optimal by sampling the possible directions...

متن کامل

Modal Bandits

Analyses of multi-armed bandits primarily presume that the value of an arm is its expected reward. We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions.

متن کامل

A Generalized Gittins Index for a Class of Multiarmed Bandits with General Resource Requirements

We generalise classical multi-armed and restless bandits to allow for the distribution of a (fixed amount of a) divisible resource among the constituent bandits at each decision point. Bandit activation consumes amounts of the available resource which may vary by bandit and state. Any collection of bandits may be activated at any decision epoch provided they do not consume more resource than is...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1802.04339  شماره 

صفحات  -

تاریخ انتشار 2018